Robust face-voice based speaker identity verification using multilevel fusion

نویسندگان

Girija Chetty

Michael Wagner

چکیده

In this paper, we propose a robust multilevel fusion strategy involving cascaded multimodal fusion of audio–lip–face motion, correlation and depth features for biometric person authentication. The proposed approach combines the information from different audio–video based modules, namely: audio–lip motion module, audio–lip correlation module, 2D + 3D motion-depth fusion module, and performs a hybrid cascaded fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/ test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio–lip motion, correlation and depth) fusion EERs were 42.9%, 32%, 15%, and 7.3%, respectively, for biometric person authentication task. Crown copyright 2008 Published by Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multifactor Fusion for Audio-Visual Speaker Recognition

In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...

متن کامل

Video to the Rescue

Automatic person identity verification based on biometrics is a challenging problem, and has received much attention during recent years due to its many applications in on-line transaction processing, law enforcement, and security applications. However, most identity verification systems are primarily based on voice biometrics, and hence are more vulnerable to acoustic noise and channel distort...

متن کامل

Talking-face authentication

Numerous studies have exposed the limits of biometric identity verification based on a single modality (such as fingerprint, iris, handwritten signature, voice, face). The talking face modality, that includes both face recognition and speaker verification, is a natural choice for multimodal biometrics. Talking faces provide richer opportunities for verification than does any ordinary multimodal...

متن کامل

Speaking faces for face-voice spe

In this paper, we describe an approach for an animated speaking face synthesis and its application in modeling impostor/replay attack scenarios for face-voice based speaker verification systems. The speaking face reported here learns the spatiotemporal relationship between speech acoustics and MPEG4 compliant facial animation points. The influence of articulatory, perceptual, and prosodic acous...

متن کامل

Audio-visual multilevel fusion for speech and speaker recognition

In this paper we propose a robust audio-visual speech-andspeaker recognition system with liveness checks based on audio-visual fusion of audio-lip motion and depth features. The liveness verification feature added here guards the system against advanced spoofing attempts such as manufactured or replayed videos. For visual features, a new tensor-based representation of lip motion features, extra...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Image Vision Comput.

دوره 26 شماره

صفحات -

تاریخ انتشار 2008

Robust face-voice based speaker identity verification using multilevel fusion

نویسندگان

چکیده

منابع مشابه

Multifactor Fusion for Audio-Visual Speaker Recognition

Video to the Rescue

Talking-face authentication

Speaking faces for face-voice spe

Audio-visual multilevel fusion for speech and speaker recognition

عنوان ژورنال:

اشتراک گذاری